Overview

Dataset statistics

Number of variables13
Number of observations2959
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory289.1 KiB
Average record size in memory100.0 B

Variable types

NUM13

Reproduction

Analysis started2021-05-24 18:02:18.320903
Analysis finished2021-05-24 18:03:14.307562
Duration55.99 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

total_quantity is highly correlated with gross_revenueHigh correlation
gross_revenue is highly correlated with total_quantityHigh correlation
returns is highly correlated with avg_ticket and 1 other fieldsHigh correlation
avg_ticket is highly correlated with returns and 1 other fieldsHigh correlation
avg_basket_size is highly correlated with avg_ticket and 1 other fieldsHigh correlation
avg_ticket is highly skewed (γ1 = 53.35524577) Skewed
returns is highly skewed (γ1 = 51.74099745) Skewed
avg_basket_size is highly skewed (γ1 = 44.66349716) Skewed
df_index has unique values Unique
customer_id has unique values Unique
avg_ticket has unique values Unique
recency_days has 34 (1.1%) zeros Zeros
returns has 1481 (50.1%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct count2959
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2307.841838458939
Minimum0
Maximum5685
Zeros1
Zeros (%)< 0.1%
Memory size23.1 KiB

Quantile statistics

Minimum0
5-th percentile184.9
Q1924.5
median2110
Q33521.5
95-th percentile5020.4
Maximum5685
Range5685
Interquartile range (IQR)2597

Descriptive statistics

Standard deviation1548.743202
Coefficient of variation (CV)0.6710785707
Kurtosis-1.007193055
Mean2307.841838
Median Absolute Deviation (MAD)1268
Skewness0.3438146073
Sum6828904
Variance2398605.507
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
40941< 0.1%
 
45401< 0.1%
 
45241< 0.1%
 
12781< 0.1%
 
33231< 0.1%
 
12741< 0.1%
 
33191< 0.1%
 
33171< 0.1%
 
12681< 0.1%
 
33131< 0.1%
 
Other values (2949)294999.7%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
56851< 0.1%
 
56751< 0.1%
 
56691< 0.1%
 
56481< 0.1%
 
56441< 0.1%
 

customer_id
Real number (ℝ≥0)

UNIQUE

Distinct count2959
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15274.081446434606
Minimum12347
Maximum18287
Zeros0
Zeros (%)0.0%
Memory size11.6 KiB

Quantile statistics

Minimum12347
5-th percentile12623.7
Q113806.5
median15227
Q316769
95-th percentile17964.1
Maximum18287
Range5940
Interquartile range (IQR)2962.5

Descriptive statistics

Standard deviation1717.644411
Coefficient of variation (CV)0.1124548417
Kurtosis-1.205531349
Mean15274.08145
Median Absolute Deviation (MAD)1484
Skewness0.02900451519
Sum45196007
Variance2950302.321
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
143351< 0.1%
 
156791< 0.1%
 
156891< 0.1%
 
177361< 0.1%
 
156871< 0.1%
 
177341< 0.1%
 
136361< 0.1%
 
127221< 0.1%
 
136341< 0.1%
 
156811< 0.1%
 
Other values (2949)294999.7%
 
ValueCountFrequency (%) 
123471< 0.1%
 
123481< 0.1%
 
123521< 0.1%
 
123561< 0.1%
 
123581< 0.1%
 
ValueCountFrequency (%) 
182871< 0.1%
 
182831< 0.1%
 
182821< 0.1%
 
182771< 0.1%
 
182761< 0.1%
 

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count2953
Unique (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2756.250922608989
Minimum15.0
Maximum279138.01999999984
Zeros0
Zeros (%)0.0%
Memory size23.1 KiB

Quantile statistics

Minimum15
5-th percentile233.229
Q1573.085
median1091.39
Q32312.09
95-th percentile7237.701
Maximum279138.02
Range279123.02
Interquartile range (IQR)1739.005

Descriptive statistics

Standard deviation10597.64586
Coefficient of variation (CV)3.844949592
Kurtosis352.8143682
Mean2756.250923
Median Absolute Deviation (MAD)674.75
Skewness16.75139769
Sum8155746.48
Variance112310097.8
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
379.6520.1%
 
731.920.1%
 
734.9420.1%
 
745.0620.1%
 
33120.1%
 
533.3320.1%
 
1842.141< 0.1%
 
618.521< 0.1%
 
899.631< 0.1%
 
9430.521< 0.1%
 
Other values (2943)294399.5%
 
ValueCountFrequency (%) 
151< 0.1%
 
36.561< 0.1%
 
451< 0.1%
 
521< 0.1%
 
52.21< 0.1%
 
ValueCountFrequency (%) 
279138.021< 0.1%
 
259657.31< 0.1%
 
194550.791< 0.1%
 
168472.51< 0.1%
 
140450.721< 0.1%
 

recency_days
Real number (ℝ≥0)

ZEROS

Distinct count272
Unique (%)9.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.1753970936127
Minimum0.0
Maximum373.0
Zeros34
Zeros (%)1.1%
Memory size23.1 KiB

Quantile statistics

Minimum0
5-th percentile2
Q111
median31
Q381
95-th percentile242
Maximum373
Range373
Interquartile range (IQR)70

Descriptive statistics

Standard deviation77.70318513
Coefficient of variation (CV)1.21079399
Kurtosis2.789481599
Mean64.17539709
Median Absolute Deviation (MAD)26
Skewness1.800506206
Sum189895
Variance6037.784979
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1993.3%
 
4872.9%
 
3852.9%
 
2852.9%
 
8762.6%
 
10672.3%
 
9662.2%
 
7652.2%
 
17642.2%
 
16551.9%
 
Other values (262)221074.7%
 
ValueCountFrequency (%) 
0341.1%
 
1993.3%
 
2852.9%
 
3852.9%
 
4872.9%
 
ValueCountFrequency (%) 
37320.1%
 
37240.1%
 
3711< 0.1%
 
3681< 0.1%
 
36640.1%
 

number_of_purchases
Real number (ℝ≥0)

Distinct count56
Unique (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.738763095640419
Minimum1.0
Maximum206.0
Zeros0
Zeros (%)0.0%
Memory size23.1 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile17
Maximum206
Range205
Interquartile range (IQR)4

Descriptive statistics

Standard deviation8.867384912
Coefficient of variation (CV)1.545173544
Kurtosis190.4643752
Mean5.738763096
Median Absolute Deviation (MAD)2
Skewness10.75880319
Sum16981
Variance78.63051517
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
278426.5%
 
349916.9%
 
439313.3%
 
52378.0%
 
11816.1%
 
61735.8%
 
71384.7%
 
8983.3%
 
9692.3%
 
10551.9%
 
Other values (46)33211.2%
 
ValueCountFrequency (%) 
11816.1%
 
278426.5%
 
349916.9%
 
439313.3%
 
52378.0%
 
ValueCountFrequency (%) 
2061< 0.1%
 
1991< 0.1%
 
1241< 0.1%
 
971< 0.1%
 
9120.1%
 

total_quantity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1668
Unique (%)56.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1613.1263940520446
Minimum1.0
Maximum196844.0
Zeros0
Zeros (%)0.0%
Memory size23.1 KiB

Quantile statistics

Minimum1
5-th percentile103
Q1298
median642
Q31401.5
95-th percentile4410.9
Maximum196844
Range196843
Interquartile range (IQR)1103.5

Descriptive statistics

Standard deviation5897.012223
Coefficient of variation (CV)3.655641768
Kurtosis464.5308241
Mean1613.126394
Median Absolute Deviation (MAD)422
Skewness17.83100874
Sum4773241
Variance34774753.15
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
310110.4%
 
15090.3%
 
28880.3%
 
8880.3%
 
24680.3%
 
8480.3%
 
26080.3%
 
49370.2%
 
51670.2%
 
27270.2%
 
Other values (1658)287897.3%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
121< 0.1%
 
161< 0.1%
 
171< 0.1%
 
ValueCountFrequency (%) 
1968441< 0.1%
 
809971< 0.1%
 
802631< 0.1%
 
773731< 0.1%
 
699931< 0.1%
 

product_variety
Real number (ℝ≥0)

Distinct count341
Unique (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.53092260898953
Minimum1.0
Maximum1786.0
Zeros0
Zeros (%)0.0%
Memory size23.1 KiB

Quantile statistics

Minimum1
5-th percentile7
Q126
median52
Q3101
95-th percentile234
Maximum1786
Range1785
Interquartile range (IQR)75

Descriptive statistics

Standard deviation96.93923869
Coefficient of variation (CV)1.21888739
Kurtosis82.3396579
Mean79.53092261
Median Absolute Deviation (MAD)33
Skewness6.389759612
Sum235332
Variance9397.215997
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
24441.5%
 
18391.3%
 
37391.3%
 
28391.3%
 
26371.3%
 
25371.3%
 
14361.2%
 
30361.2%
 
15351.2%
 
23351.2%
 
Other values (331)258287.3%
 
ValueCountFrequency (%) 
1250.8%
 
2150.5%
 
3200.7%
 
4190.6%
 
5331.1%
 
ValueCountFrequency (%) 
17861< 0.1%
 
17661< 0.1%
 
13221< 0.1%
 
11181< 0.1%
 
8841< 0.1%
 

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
UNIQUE

Distinct count2959
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51.92560756198357
Minimum2.150588235294117
Maximum56157.5
Zeros0
Zeros (%)0.0%
Memory size23.1 KiB

Quantile statistics

Minimum2.150588235
5-th percentile4.940997956
Q113.11688889
median17.95658654
Q324.91514815
95-th percentile90.48283333
Maximum56157.5
Range56155.34941
Interquartile range (IQR)11.79825926

Descriptive statistics

Standard deviation1038.678599
Coefficient of variation (CV)20.0032055
Kurtosis2881.050162
Mean51.92560756
Median Absolute Deviation (MAD)5.964070815
Skewness53.35524577
Sum153647.8728
Variance1078853.232
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
18.895833331< 0.1%
 
17.706071431< 0.1%
 
26.799064331< 0.1%
 
25.696354171< 0.1%
 
20.434210531< 0.1%
 
20.636686751< 0.1%
 
16.837560981< 0.1%
 
20.23662421< 0.1%
 
13.114444441< 0.1%
 
114.1439641< 0.1%
 
Other values (2949)294999.7%
 
ValueCountFrequency (%) 
2.1505882351< 0.1%
 
2.43251< 0.1%
 
2.4623711341< 0.1%
 
2.5112413791< 0.1%
 
2.5153333331< 0.1%
 
ValueCountFrequency (%) 
56157.51< 0.1%
 
4453.431< 0.1%
 
3202.921< 0.1%
 
1687.21< 0.1%
 
952.98751< 0.1%
 

avg_recency_days
Real number (ℝ≥0)

Distinct count1258
Unique (%)42.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean67.52846579627077
Minimum1.0
Maximum366.0
Zeros0
Zeros (%)0.0%
Memory size23.1 KiB

Quantile statistics

Minimum1
5-th percentile8
Q126
median48.66666667
Q385.70833333
95-th percentile201
Maximum366
Range365
Interquartile range (IQR)59.70833333

Descriptive statistics

Standard deviation63.56934557
Coefficient of variation (CV)0.9413710917
Kurtosis4.875917435
Mean67.5284658
Median Absolute Deviation (MAD)26.33333333
Skewness2.061314407
Sum199816.7303
Variance4041.061696
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
14240.8%
 
4220.7%
 
70210.7%
 
7200.7%
 
35190.6%
 
49180.6%
 
11170.6%
 
46170.6%
 
42160.5%
 
28160.5%
 
Other values (1248)276993.6%
 
ValueCountFrequency (%) 
1150.5%
 
1.51< 0.1%
 
2120.4%
 
2.51< 0.1%
 
2.6013986011< 0.1%
 
ValueCountFrequency (%) 
3661< 0.1%
 
3651< 0.1%
 
3631< 0.1%
 
3621< 0.1%
 
35720.1%
 

returns
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count208
Unique (%)7.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61.20378506252112
Minimum0.0
Maximum80995.0
Zeros1481
Zeros (%)50.1%
Memory size23.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q39
95-th percentile96.2
Maximum80995
Range80995
Interquartile range (IQR)9

Descriptive statistics

Standard deviation1514.769178
Coefficient of variation (CV)24.74959966
Kurtosis2758.400827
Mean61.20378506
Median Absolute Deviation (MAD)0
Skewness51.74099745
Sum181102
Variance2294525.662
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0148150.1%
 
11645.5%
 
21475.0%
 
31053.5%
 
4893.0%
 
6782.6%
 
5612.1%
 
12501.7%
 
7431.5%
 
8431.5%
 
Other values (198)69823.6%
 
ValueCountFrequency (%) 
0148150.1%
 
11645.5%
 
21475.0%
 
31053.5%
 
4893.0%
 
ValueCountFrequency (%) 
809951< 0.1%
 
90141< 0.1%
 
80041< 0.1%
 
44271< 0.1%
 
37681< 0.1%
 

frequency
Real number (ℝ≥0)

Distinct count1349
Unique (%)45.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06236414246599363
Minimum0.005449591280653951
Maximum3.0
Zeros0
Zeros (%)0.0%
Memory size23.1 KiB

Quantile statistics

Minimum0.005449591281
5-th percentile0.009428066038
Q10.01775413712
median0.02919708029
Q30.05476952305
95-th percentile0.2068698161
Maximum3
Range2.994550409
Interquartile range (IQR)0.03701538593

Descriptive statistics

Standard deviation0.1327145366
Coefficient of variation (CV)2.12805839
Kurtosis127.9863959
Mean0.06236414247
Median Absolute Deviation (MAD)0.01427170716
Skewness9.012573411
Sum184.5354976
Variance0.01761314823
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.1666666667210.7%
 
0.02777777778200.7%
 
0.3333333333200.7%
 
0.09090909091180.6%
 
0.0625170.6%
 
0.4160.5%
 
0.02380952381150.5%
 
0.1333333333150.5%
 
0.25150.5%
 
0.03571428571150.5%
 
Other values (1339)278794.2%
 
ValueCountFrequency (%) 
0.0054495912811< 0.1%
 
0.0054644808741< 0.1%
 
0.0054945054951< 0.1%
 
0.0055096418731< 0.1%
 
0.00558659217920.1%
 
ValueCountFrequency (%) 
31< 0.1%
 
21< 0.1%
 
1.5714285711< 0.1%
 
1.530.1%
 
1130.4%
 

avg_basket_size
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct count1975
Unique (%)66.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean249.50441557068999
Minimum1.0
Maximum40498.5
Zeros0
Zeros (%)0.0%
Memory size23.1 KiB

Quantile statistics

Minimum1
5-th percentile44.47222222
Q1103.3333333
median172.3333333
Q3281.5961538
95-th percentile598.92
Maximum40498.5
Range40497.5
Interquartile range (IQR)178.2628205

Descriptive statistics

Standard deviation792.5028622
Coefficient of variation (CV)3.176307964
Kurtosis2252.43763
Mean249.5044156
Median Absolute Deviation (MAD)82.66666667
Skewness44.66349716
Sum738283.5657
Variance628060.7867
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
100110.4%
 
114100.3%
 
8290.3%
 
8690.3%
 
7390.3%
 
13680.3%
 
6080.3%
 
7580.3%
 
28870.2%
 
10570.2%
 
Other values (1965)287397.1%
 
ValueCountFrequency (%) 
120.1%
 
3.3333333331< 0.1%
 
5.3333333331< 0.1%
 
5.6666666671< 0.1%
 
6.1428571431< 0.1%
 
ValueCountFrequency (%) 
40498.51< 0.1%
 
6009.3333331< 0.1%
 
42821< 0.1%
 
39061< 0.1%
 
3868.651< 0.1%
 

unq_basket_size
Real number (ℝ≥0)

Distinct count1004
Unique (%)33.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.169087074215728
Minimum1.0
Maximum299.70588235294116
Zeros0
Zeros (%)0.0%
Memory size23.1 KiB

Quantile statistics

Minimum1
5-th percentile3.497368421
Q110.04166667
median17.25
Q327.75
95-th percentile56.865
Maximum299.7058824
Range298.7058824
Interquartile range (IQR)17.70833333

Descriptive statistics

Standard deviation19.4753378
Coefficient of variation (CV)0.878490744
Kurtosis27.939353
Mean22.16908707
Median Absolute Deviation (MAD)8.25
Skewness3.508087524
Sum65598.32865
Variance379.2887824
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
13531.8%
 
14391.3%
 
11361.2%
 
9331.1%
 
20331.1%
 
1321.1%
 
17311.0%
 
18301.0%
 
10291.0%
 
5291.0%
 
Other values (994)261488.3%
 
ValueCountFrequency (%) 
1321.1%
 
1.21< 0.1%
 
1.251< 0.1%
 
1.33333333320.1%
 
1.580.3%
 
ValueCountFrequency (%) 
299.70588241< 0.1%
 
2591< 0.1%
 
203.51< 0.1%
 
1481< 0.1%
 
1451< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexcustomer_idgross_revenuerecency_daysnumber_of_purchasestotal_quantityproduct_varietyavg_ticketavg_recency_daysreturnsfrequencyavg_basket_sizeunq_basket_size
00178505391.21372.034.01733.021.018.15222235.50000040.00.48611150.9705888.735294
11130473232.5956.09.01390.0105.018.90403527.25000035.00.048780154.44444419.000000
22125836705.382.015.05028.0114.028.90250023.18750050.00.045699335.20000015.466667
3313748948.2595.05.0439.024.033.86607192.6666670.00.01792187.8000005.600000
4415100876.00333.03.080.01.0292.0000008.60000022.00.13636426.6666671.000000
55152914623.3025.014.02102.061.045.32647123.20000029.00.054441150.1428577.285714
66146885630.877.021.03621.0148.017.21978618.300000399.00.073569172.42857115.571429
77178095411.9116.012.02057.046.088.71983635.70000041.00.039106171.4166675.083333
881531160767.900.091.038194.0567.025.5434644.144444474.00.315508419.71428626.142857
99160982005.6387.07.0613.034.029.93477647.6666670.00.02439087.5714299.571429

Last rows

df_indexcustomer_idgross_revenuerecency_daysnumber_of_purchasestotal_quantityproduct_varietyavg_ticketavg_recency_daysreturnsfrequencyavg_basket_sizeunq_basket_size
2949561017254272.444.02.0252.0100.02.43250011.00.00.166667126.00000056.0
29505616177271060.2515.01.0645.066.016.0643946.06.00.285714645.00000066.0
2951562617232421.522.02.0203.030.011.70888912.00.00.153846101.50000018.0
2952562717468137.0010.02.0116.05.027.4000004.00.00.40000058.0000002.5
2953563813596697.045.02.0406.0133.04.1990367.00.00.250000203.00000083.0
29545644148931237.859.02.0799.072.016.9568492.00.00.666667399.50000036.5
2955564812479473.2011.01.0382.030.015.7733334.034.00.333333382.00000030.0
2956566914126706.137.03.0508.014.047.0753333.050.01.000000169.3333335.0
29575675135211092.391.03.0733.0312.02.5112414.50.00.300000244.333333145.0
2958568515060301.848.04.0262.080.02.5153331.00.02.00000065.50000030.0